Multilingual Data Configurable Text-to-Speech System for Embedded Devices
نویسندگان
چکیده
In this paper a low footprint multilingual text-to-speech (MLTTS) framework is presented. The system is a part of a speaker independent name dialing system that has been introduced in Nokia Series 60 mobile phones. In the ML-TTS systems that are based on the Klatt88 engine there usually exist sets of language specific rules that are used to modify the speech synthesis parameters. Usually, the size of the program code due to the language specific rules becomes large when the number of languages increases. In addition, adding TTS support for a new language is not so easy when the TTS rules are implemented as program code. The development work would require the modifications of the source code, which is always prone to errors and time consuming. The paper presents a novel scheme that both alleviates the memory problems and also makes the language development easier compared to the typical existing solutions. In this framework the language dependent TTS rules are implemented as a scripting language that is stored in text files, one file per each language. The files are converted into a binary form and the rules therefore are implemented as data. With the approach, only the data of the active language needs to be kept in memory and typically the size of a single data file remains small. During synthesis an interpreter is used to process the rules and modify the synthesis parameters accordingly. Moreover, adding TTS support for a new language involves writing the new set of language specific rules and ideally no modifications to the TTS engine code are needed. In addition to the language specific rules, all language dependent information, such as the prosodic model, is stored into the binary file i.e. the language package. Also due to the introduction of the language packages, the TTS engine can be configured to any desired set of languages simply by preparing and providing the associated language packages.
منابع مشابه
A scalable architecture for multilingual speech recognition on embedded devices
In-car infotainment and navigation devices are typical examples where speech based interfaces are successfully applied. While classical applications are monolingual, such as voice commands or monolingual destination input, the trend goes towards multilingual applications. Examples are music player control or multilingual destination input. As soon as more languages are considered the training a...
متن کاملOnline generation of acoustic models for multilingual speech recognition
Our goal is to provide a multilingual speech based Human Machine Interface for in-car infotainment and navigation systems. The multilinguality is for example needed for music player control via speech as artist and song names in the globalized music market come from many languages. Another frequent use case is the input of foreign navigation destinations via speech. In this paper we propose app...
متن کاملSpeech data collection in an under-resourced language within a multilingual context
In this paper, we present an end-to-end solution to the development of an automatic speech recognition (ASR) system in typical under-resourced languages, where the target language is likely to be influenced by one more embedded foreign languages. We first describe the collection and processing of the text corpus crawled from the World Wide Web using the Rapid Language Adaptation Toolkit. In par...
متن کاملAn embedded and concatenative approach to TTS of multiple languages
In thi and appro for E (Es.), efficie archit embe can b comm select text p the la are u letterspeec etc., a This paper presents an embedded and concatenative approach to multilingual text-to-speech system (ECMTTS). Under a uniform architecture, the TTS modules are separated into language dependent and independent ones. A specifically defined super phonetic symbol set enables to use uniform spee...
متن کاملMultilingual text-to-phoneme mapping
This paper introduces a novel approach for generating multilingual text-to-phoneme mappings for use in multilingual speech recognition systems. The multilingual mappings are based on the weighted outputs from a neural network text-to-phoneme model, trained on data mixed from several languages. The multilingual mappings used together with a branched grammar decoding scheme is able to capture bot...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006